Quantum process tomography with unsupervised learning and tensor networks

The impressive pace of advance of quantum technology calls for robust and scalable techniques for the characterization and validation of quantum hardware. Quantum process tomography, the reconstruction of an unknown quantum channel from measurement data, remains the quintessential primitive to completely characterize quantum devices. However, due to the exponential scaling of the required data and classical post-processing, its range of applicability is typically restricted to one- and two-qubit gates. Here, we present a technique for performing quantum process tomography that addresses these issues by combining a tensor network representation of the channel with a data-driven optimization inspired by unsupervised machine learning. We demonstrate our technique through synthetically generated data for ideal one- and two-dimensional random quantum circuits of up to 10 qubits, and a noisy 5-qubit circuit, reaching process fidelities above 0.99 using several orders of magnitude fewer (single-qubit) measurement shots than traditional tomographic techniques. Our results go far beyond state-of-the-art, providing a practical and timely tool for benchmarking quantum circuits in current and near-term quantum computers.


Tensor networks for Choi matrices
Due to the intrinsic exponential scaling of exact classical representations of quantum states and operators, a direct estimation of the Choi matrix given the knowledge of the channel is limited to a very small number of qubits. One may wonder if, for a subset of quantum channels that admit an efficient representation (e.g. low-depth quantum circuits), the Choi matrix can be also computed efficiently. To this end, we adopt a tensor network representation of the Choi matrix, which for a N -qubit quantum channel is a rank-4N tensor, with 2N input indices and 2N output indices [1]. In its canonical form, the input and output indices are placed on the upper and lower half of the full tensor respectively.
We simplify the discussion and assume that the quantum channel is noiseless, i.e. it implements a unitary evolution E : ρ −→ U ρ U † , where U corresponds to a quantum circuits compiled into one-and two-qubit gates. Depending on the type of gates, the geometry of the circuit and its depth, the unitary U may admit an efficient representation as a matrix product operator (MPO) where U σσ ′ = ⟨σ|U |σ ′ ⟩. Each B j is a rank-4 tensor with physical indices (σ j , σ ′ j ) and bond indices (µ j−1 , µ j ) (Supplementary Fig. 1a). The bond dimension of the MPO is defined as the maximum dimension of any bond index χ U = max j {χ µj |χ µj = dim[µ j ]}, and it represents the measure of complexity of the MPO representation of the unitary U .
The straightforward way to obtain a tensor network representation of the Choi matrix (for a unitary circuit) is to simply apply the circuit MPO to the (unnormalized) density operator Φ = |Φ + ⟩⟨Φ + | ⊗N for the N unnormalized Bell pairs, each described by a matrix product state * Work done prior to Amazon; gttorlai@amazon.com (MPS) with bond dimension χ Φ + = 2 ( Supplementary  Fig. 1b). In doing so, there is freedom in how the contractions between the circuit MPO and Bell state MPS is done, stemming from different arrangements of the indices of Φ. If one were to pursue the Choi matrix in its canonical form, the MPS indices should to be properly swapped before the contraction with the MPO (Supplementary Fig. 1c). This however results in a very inefficient tensor network representation, as it brings the tensor product of N Bell pairs into a 2N -qubit maximally entangled state, which saturates the bond dimension of the MPS to χ Φ = 2 N .
In practice, there is no particular reason to keep the Choi matrix in its canonical form, and an efficient representation is instead obtained as follows. First, we contract the circuit MPO with the physical indices of each Bell pair MPS (leaving out the ancilla qubits). Because the Bell state is equivalent to the vectorization of the identity matrix, the contraction between the circuit MPO and the full MPS simply returns the MPO itself. The inner (ancillary) indices can then be folded back into each local MPO tensor ( Supplementary Fig. 1d). The result of this operation, i.e. the Choi matrix, is a rank-1 density matrix written in terms of an MPS with physical dimension d 2 and bond dimension χ U . Thus, the Choi matrix can be obtained efficiently as long as the MPO bond dimension is sufficiently low. Note that this operation is also called unravelling in the context of columnvectorization of dense matrices [1]. In the case of a tensor product channel E = E 1 ⊗ · · · ⊗ E N , the Choi matrix obtained in this form is the tensor product of the individual sub-system Choi matrices This would not be the case if the one adopted the canonical ordering of the indices ( Supplementary Fig. 1b), leading to Λ E = Λ E1⊗E2⊗···⊗E N .

Tensor-network gradients
The parameters of the LPDO -the tensor components ϑ = { A j } -are variationally optimized by minimizing the Kullbach-Leibler (KL) divergence where P E (β | α) is the process probability defined in Eq. (3) in the main text. The probability distribution P ϑ (β | α) associated to the process described by the LPDO is where Λ ϑ = d N Z −1 ϑ Λ ϑ is the properly normalized LPDO Choi matrix, and we defined the unnormalized LPDO probability distribution P ϑ (β | α). By averaging Eq. (2) over the data set D, we obtain the negative log-likelihood where we omitted the constant entropy term of the target distribution. Given the cost function C(ϑ), the LPDO parameters are tuned according to the gradients Since, in general, the tensor components { A j } are complex-valued, one should adopt the Wirtinger derivatives, and update each tensor A j with the gradient taken with respect to its conjugate value A * j . The calculation of the gradients proceeds in two steps [2,3]. First one evaluates the normalization Z ϑ Supplementary Figure 2. Tensor-network optimization. (a) Tensor network contraction to evaluate the normalization Z ϑ of the LPDO. (b) Tensor network contraction to evaluate the unnormalized probability P ϑ (β | α). (c) Calculation of the gradients of the normalization ∂ ϑ Z ϑ . First, the left-environment tensors Lj (j = 1, . . . , N − 1) are computed sequentially (and stored) by contracting the Kraus index and tracing the input/output index of the LPDO, from left to right. The same is repeated by contracting from right to left for the right-environment tensors Rj (j = 2, . . . , N ). Then, the gradients of the normalization with respect to the conjugate tensor A * j are evaluated using with previously stored environment tensors. (d) Calculation of the gradients of the unnormalized probability ∂ ϑ P ϑ (α, β) for one given data point (α, β). As for the normalization, left and right environment tensors are calculated -where now the LPDO is contracted with input and output states corresponding to α and β respectively -and subsequently used to evaluate the gradients with respect to each conjugate tensor A * j .
with a trace over all the input/output indices of the LPDO ( Supplementary Fig. 2a). The gradient of Z ϑ with respect to the component A * j corresponds to the tensor network used to compute Z ϑ with the tensor A * j removed from it. To reduce the number of tensor contractions required to compute the full set of gradients, one should first calculate and store the set of environment tensors {L j } and {R j } shown in Supplementary Fig. 2c, obtained in two sweeps over the LPDO respectively from left to right and from right to left. The gradients of the normalization with respect to each tensor A * j are then calculated with a third sweep as (6) The second step repeats this procedure for the datadependent term in the cost function. For each single data point (α, β), one computes the unnormalized probability P ϑ (β | α) by contracting the LPDO with the cor-rrersponding input state ρ α and measurement operator M β (Supplementary Fig. 2b). One then sweeps through the LPDO to accumulate the left and right environment tensors, and compute the gradient of the unnormalized probability analogously to the normalization (Supplementary Fig. 2d). The final gradients are simply the average over all data samples.

Gradient updates
Once the gradients G ϑ with respect to each tensor components are known, the LPDO is updated using gradient descent. In its simplest form, the parameters are changed according to where η is the size of the update (i.e. the learning rate), and the gradients are computed over the full training